Part:BBa_K5035008
AI_1 Generated Reductive Dehalogenase
The AI_1 dehalogenase was designed using the pLLM ESM2 and 68 reductive dehalogenase candidate enzymes as training data. This enzyme generation process included a step where the training data was made even more sequentially diverse by removing any sequences greater than 60% identity. 5 sequences from the 68 training data sequences were removed. The training data was then masked using the beta linear function. This masking strategy was used because we generated novel proteins using ESM2 by gradually unmasking amino acids. To effectively train a model that generates in this way, it must be trained on data masked in different frequencies and positions. This strategy ensured that sequences distinct from the training data were produced. Using this training method the model is trained for one epoch and after training on 250 training examples, the model generates a sequence. The sequences generated by this model are always set at the length of one of the training data sequences. The produced sequences were saved and then reviewed computationally to determine whether they retained key catalytic features characteristic of reductive dehalogenases. After training, the model was prompted to produce sequences, which were also computationally analyzed.
This entire generation method produced multiple protein sequences that preserved the necessary cofactor binding sites and active sites of the corrinoid iron-sulfur-containing reductive dehalogenases that comprise the training data. Despite this structural similarity, all of these generated enzymes had very low sequential similarity to the training data enzymes suggesting that they were novel.
AI_1 is one of the most promising generated enzymes and was chosen by the 2024 IEA iGEM team to be further tested experimentally.
Sequence and Features
- 10COMPATIBLE WITH RFC[10]
- 12COMPATIBLE WITH RFC[12]
- 21COMPATIBLE WITH RFC[21]
- 23COMPATIBLE WITH RFC[23]
- 25INCOMPATIBLE WITH RFC[25]Illegal NgoMIV site found at 832
Illegal NgoMIV site found at 1317 - 1000INCOMPATIBLE WITH RFC[1000]Illegal SapI site found at 436
None |